A Fast Normalized Maximum Likelihood Algorithm for Multinomial Data
نویسندگان
چکیده
Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as model selection or data clustering. In the case of multinomial data, computing the modern version of stochastic complexity, defined as the Normalized Maximum Likelihood (NML) criterion, requires computing a sum with an exponential number of terms. Furthermore, in order to apply NML in practice, one often needs to compute a whole table of these exponential sums. In our previous work, we were able to compute this table by a recursive algorithm. The purpose of this paper is to significantly improve the time complexity of this algorithm. The techniques used here are based on the discrete Fourier transform and the convolution theorem.
منابع مشابه
Exact Maximum Likelihood Estimation for Word Mixtures
The mixture model for generating document is a generative language model used in information retrieval. While using this model, there are situations that we need to find the maximum likelihood estimation of the density of one multinomial, given fixed mixture weight and the densities of the other multinomial. In this paper, we provide an exact solution and a quick algorithm to solve this problem...
متن کاملAchievability of asymptotic minimax regret by horizon-dependent and horizon-independent strategies
The normalized maximum likelihood distribution achieves minimax coding (log-loss) regret given a fixed sample size, or horizon, n. It generally requires that n be known in advance. Furthermore, extracting the sequential predictions from the normalized maximum likelihood distribution is computationally infeasible for most statistical models. Several computationally feasible alternative strategie...
متن کاملComputing the Regret Table for Multinomial Data
Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as model selection or data clustering. In the case of multinomial data, computing the modern version of stochastic complexity, defined as the Normalized Maximum Likeli...
متن کاملParent Assignment Is Hard for the MDL, AIC, and NML Costs
Several hardness results are presented for the parent assignment problem: Given m observations of n attributes x1, . . . , xn, find the best parents for xn, that is, a subset of the preceding attributes so as to minimize a fixed cost function. This attribute or feature selection task plays an important role, e.g., in structure learning in Bayesian networks, yet little is known about its computa...
متن کاملBearing Fault Detection Based on Maximum Likelihood Estimation and Optimized ANN Using the Bees Algorithm
Rotating machinery is the most common machinery in industry. The root of the faults in rotating machinery is often faulty rolling element bearings. This paper presents a technique using optimized artificial neural network by the Bees Algorithm for automated diagnosis of localized faults in rolling element bearings. The inputs of this technique are a number of features (maximum likelihood estima...
متن کامل